Feature/client daemon refactoring by joelteply · Pull Request #256 · CambrianTech/continuum

joelteply · 2026-01-13T23:16:35Z

Summary

Brief description of changes and why they're needed

Change Type & Scale

🐛 Bug fix (fixes an issue)
✨ New feature (adds functionality)
📚 Documentation (README, guides, comments)
🔧 Configuration (ESLint, CI, build tools)
🧹 Cleanup (refactor, remove dead code)
💥 Breaking change (existing functionality changes)

Scale:

📏 Small (<50 files changed)
📐 Medium (50-200 files changed)
📊 Large integration (200+ files changed)

Testing & Verification

✅ Local testing completed
🧪 npm run lint passes
🔍 npm test passes
📸 Screenshot/visual verification (if UI changes)
🤖 Portal commands tested: python python-client/ai-portal.py --cmd tests

AI Development Notes

🔄 Maintains backward compatibility with existing AI agents
📋 Updated CLAUDE.md if process changes
🎯 Follows modular architecture principles
🚨 Emergency verification system still works

Status & Readiness

🟢 Ready to merge (all checks pass, no known issues)
🟡 Merge with caution (some issues documented below)
🔴 Do not merge yet (major issues need resolution)

Known Issues: (if any)

Files Changed

List key files and why they changed

Related Issues

Fixes #(issue) or Relates to #(issue)

For AI Agents: Use python python-client/ai-portal.py --dashboard to verify system health after merging

- Create worker_pool.rs with multi-instance model loading - Each worker has own QuantizedModelState + Metal GPU device - Request channel distributes work via tokio mpsc - Semaphore tracks available workers for backpressure - Auto-detect worker count based on system memory (~2GB per worker) - Update InferenceCoordinator: maxConcurrent 1→3, reduced cooldowns - Fallback to single BF16 instance when LoRA adapters requested Before: 1 request/~6s + 30s timeout cascade After: 4 requests/~6s in parallel, no timeouts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Daemons start in dependency order: critical → integration → lightweight - Critical path (data, command, events, session): 350ms max - Integration daemons wait for DataDaemon before starting - Lightweight daemons (health, widget, logger) start immediately - Phase breakdown metrics logged for observability Phases: - critical: 4 daemons, max=207ms (UI can render) - integration: 7 daemons, max=3518ms (AIProvider bottleneck) - lightweight: 7 daemons, max=130ms Total startup: 3531ms (critical path ready much sooner) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Fix SystemOrchestrator navigate command (remove invalid --path param) - Fix launch-and-capture.ts: check ping, refresh if connected, open if not - Fix SystemMetricsCollector countCommands to use ping instead of browser logs - Add deterministic UUIDs for seeded users (Joel, Claude Code) - Improve UserDaemonServer error logging for persona client creation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Browser launch: - ALWAYS open browser window, don't just reload - Ensures user sees something even if WebSocket connected but window closed - Both SystemOrchestrator and launch-and-capture now open browser unconditionally UUID fix: - stringToUUID now generates valid 36-char UUIDs (was generating 32-char) - Last segment now correctly 12 chars instead of 8 ChatWidget: - Use server backend for $in queries (localStorage doesn't support $in) - Add debug logging for member loading Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- DataReadBrowserCommand now supports backend:'server' to bypass localStorage - ChatWidget uses server backend for room queries to avoid stale cache - Fixes issues with members not loading due to localStorage not supporting $in Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

CONSOLIDATION (Ministry of Code Deletion): - RoutingService is now THE single source of truth for room/user resolution - Added resolveRoomIdentifier() and resolveUserIdentifier() convenience functions - Added name fallback query for legacy support - Migrated ChatSendServerCommand: deleted findRoom(), uses RoutingService - Migrated ChatAnalyzeServerCommand: deleted resolveRoom(), uses RoutingService - Migrated ChatPollServerCommand: deleted inline resolution, uses RoutingService - WallTypes.isRoomUUID() now delegates to RoutingService.isUUID() - MainWidget: deleted dead handleTabClick/handleTabClose, simplified openContentTab ETHOS (CLAUDE.md): - Added "The Compression Principle" - one logical decision, one place - Added "The Methodical Process" - 8 mandatory steps, outlier validation - Encoded the Ministry philosophy: deletion without loss = compression = efficiency Net change: +306/-298 lines (compression-neutral while adding documentation) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Key fixes: - data-clear.ts: Clear session metadata during reseed to prevent stale entityIds from persisting (root cause of corrupted UUID bug) - MainWidget.ts: Add userId setup with retry in openContentFromUrl() and initializeContentTabs() to ensure ContentService can persist to database - RoutingService.ts: Fix example UUID in comment - SchemaBasedFactory.ts: Fix hardcoded test UUID The corrupted UUID issue (5e71a0c8-0303-4eb8-a478-3a121248) was caused by stale session metadata files that weren't cleared during data reseed. The session files stored old entityIds that no longer existed after reseeding the database. ContentState persistence now works - tabs are saved to database with correct UUIDs. Tab restore on refresh still needs investigation due to session management timing issues. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Root cause: Browser's OfflineStorageAdapter caches user_states in localStorage. When tabs are opened, the server database is updated, but localStorage retains stale data. On page refresh, loadUserContext() would get old cached data with fewer/no openItems. Fix: Add `backend: 'server'` to user_states query in loadUserContext() to bypass localStorage cache and always fetch fresh contentState from the server database. Also added debug logging (temporary) to help diagnose initialization timing issues between loadUserContext() and initializeContentTabs(). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- inference-grpc: Fix dead code, use pool stats, proper strip_prefix - data-daemon: Fix HDD acronym, add type alias for complex type - inference: Collapse nested if-let - model.rs: Use struct literal instead of removed new() Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Root cause: UserDaemon's initializeDeferred() tried to create persona clients before JTAGSystem.daemons was populated. DataDaemon emits system:ready during its initialize() phase, triggering UserDaemon's ensurePersonaClients() which needs CommandDaemon. But CommandDaemon wasn't yet registered to JTAGSystem.daemons (only happens AFTER orchestrator.startAll() returns). Fix: 1. CommandDaemonServer now registers itself to globalThis during its initialize() phase, providing early access for other daemons 2. JTAGSystem.getCommandsInterface() now checks globalThis first, falling back to this.daemons for compatibility Also fixed Clippy duplicate_mod warning in training-worker: - logger_client.rs now re-exports JTAG protocol types - messages.rs uses re-exports instead of including jtag_protocol directly Verified: All 11 AI personas now healthy and responding to messages. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Per-persona inference logging confirmed (Helper AI, Teacher AI, etc.) - System utilities correctly show [unknown] in Rust logs - AI responses verified working via Candle gRPC and cloud APIs - Version bump 1.0.7184 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

1. SignalDetector: Switch from local (slow) to Groq (fast) - Was flooding local inference queue with classification calls - Groq responds in <1s vs local ~10s - Frees local queue for actual persona responses 2. CandleGrpcAdapter: Add prompt truncation (24K char limit) - Prevents "narrow invalid args" tensor dimension errors - Large RAG contexts were sending 74000+ char prompts - Model has 8K token (~32K char) context window - Truncation preserves system prompt + recent messages Before: Constant queue backlog, tensor errors, hangs After: Workers have availability, no tensor errors, faster responses Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The chat widget was unlatching from the bottom when large messages arrived because the fixed 200px threshold was too small. Changes: - Add isLatchedToBottom state to track user intent - Dynamic threshold: max of config, 50% viewport, or 500px - ResizeObserver checks latch state instead of distance - Scroll handler updates latch with tighter 100px threshold - Scroll listener active when autoScroll enabled Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The scrollToEnd was called immediately after adding items to DOM, but the browser hadn't laid them out yet. Using double-rAF ensures the DOM is fully rendered before scrolling. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Copilot

Pull request overview

This PR appears to be a major refactoring focused on daemon/worker infrastructure improvements, including memory management for inference workers, worker pool implementation for concurrent inference, improved process management scripts, and extensive cleanup of deprecated test/script files.

Changes:

Added memory limits and worker pool support for inference workers
Refactored Rust worker modules to avoid duplicate code and improve structure
Improved shell scripts for starting/stopping workers with better process tracking
Updated TypeScript widgets and system core for better content state management
Removed 40+ deprecated test and script files

Reviewed changes

Copilot reviewed 142 out of 144 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
workers-config.json	Added memory limits configuration and per-worker memory settings
workers/training/src/messages.rs	Refactored to re-export protocol types from logger_client
workers/stop-workers.sh	Updated to use binary names from config for process termination
workers/start-workers.sh	Added memory limit parsing and per-worker log files
workers/shared/logger_client.rs	Added re-exports of JTAG protocol types
workers/inference-grpc/*	Added worker pool, GPU synchronization, persona tracking
widgets/*	Improved content state management and user identifier handling
system/core/*	Added DaemonOrchestrator for wave-based parallel startup
system/routing/*	Added room name lookup and server-side resolution functions
Multiple scripts/*	Deleted 40+ deprecated test/utility scripts

Files not reviewed (1)

src/debug/jtag/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-13T23:20:16Z

src/debug/jtag/system/core/system/shared/DaemonOrchestrator.ts

+          queuedMessages: daemon.startupQueueSize
+        });
+
+        const totalMs = endTime - startTime;


Unused variable totalMs.

Joel and others added 19 commits January 10, 2026 22:56

fix significant widget and daemon bottlenecks

3fedc0c

persona fixes

dbc068e

cleanup scripts

2ceb0d8

reliabiity issues resolved.

c801634

more reliable models

afe7def

Fix initial scroll to bottom on fresh chat load

cbc11e7

The scrollToEnd was called immediately after adding items to DOM, but the browser hadn't laid them out yet. Using double-rAF ensures the DOM is fully rendered before scrolling. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings January 13, 2026 23:16

Copilot started reviewing on behalf of joelteply January 13, 2026 23:17 View session

github-actions bot added the size: XL label Jan 13, 2026

audit fix

c8b8c97

joelteply merged commit 5a438bb into main Jan 13, 2026
3 of 5 checks passed

joelteply deleted the feature/client-daemon-refactoring branch January 13, 2026 23:18

Copilot AI reviewed Jan 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/client daemon refactoring#256

Feature/client daemon refactoring#256
joelteply merged 20 commits intomainfrom
feature/client-daemon-refactoring

joelteply commented Jan 13, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

joelteply commented Jan 13, 2026

Summary

Change Type & Scale

Testing & Verification

AI Development Notes

Status & Readiness

Known Issues: (if any)

Files Changed

Related Issues

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants